See “R/wrangle_raw_data.R” for the script that handles the data import. This Rmd file focuses on the models themselves.
For reproducibility purposes, use set.seed().
set.seed(27)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ s(dist, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.08024 0.04751 22.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.926 3.996 78.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.482 Deviance explained = 48.8%
## GCV = 0.76522 Scale est. = 0.75394 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ sum_rain + s(dist, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.816921 0.090724 9.004 < 2e-16 ***
## sum_rain 0.028779 0.008496 3.387 0.000792 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.928 3.996 80.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.498 Deviance explained = 50.5%
## GCV = 0.74389 Scale est. = 0.73069 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ mws + s(dist, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.64401 0.11825 5.446 1.01e-07 ***
## mws 0.12273 0.03059 4.012 7.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.929 3.996 81.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.504 Deviance explained = 51.2%
## GCV = 0.73389 Scale est. = 0.72086 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ sum_rain + mws + s(dist, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.473449 0.131519 3.600 0.000368 ***
## sum_rain 0.024009 0.008457 2.839 0.004808 **
## mws 0.108916 0.030659 3.552 0.000438 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.93 3.996 83.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.515 Deviance explained = 52.3%
## GCV = 0.72064 Scale est. = 0.70569 n = 334
## Warning in term[i] <- attr(terms(reformulate(term[i])), "term.labels"): number
## of items to replace is not a multiple of replacement length
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ sum_rain + s(dist + mws, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.816921 0.090724 9.004 < 2e-16 ***
## sum_rain 0.028779 0.008496 3.387 0.000792 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.928 3.996 80.95 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.498 Deviance explained = 50.5%
## GCV = 0.74389 Scale est. = 0.73069 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ sum_rain + s(dist, k = 5) + s(mws, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.37885 0.16680 8.267 3.59e-15 ***
## sum_rain -0.03264 0.01760 -1.854 0.0646 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.938 3.997 93.81 < 2e-16 ***
## s(mws) 3.928 3.996 13.56 2.09e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.566 Deviance explained = 57.8%
## GCV = 0.6497 Scale est. = 0.63051 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.08024 0.04366 24.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.937 3.997 92.96 < 2e-16 ***
## s(mws) 3.917 3.995 16.04 6.03e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.562 Deviance explained = 57.3%
## GCV = 0.65392 Scale est. = 0.63659 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5) + s(sum_rain,
## k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.08024 0.04345 24.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.938 3.997 93.805 <2e-16 ***
## s(mws) 1.655 1.748 1.351 0.3360
## s(sum_rain) 3.203 3.232 3.278 0.0114 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.566 Deviance explained = 57.8%
## GCV = 0.64956 Scale est. = 0.63051 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.08024 0.04393 24.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.936 3.997 91.78 < 2e-16 ***
## s(sum_rain) 3.890 3.989 15.22 5.69e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.557 Deviance explained = 56.7%
## GCV = 0.66193 Scale est. = 0.64444 n = 334
##
## Family: gaussian
## Link function: identity
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5) + mws
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.7406 1.4576 -1.880 0.06097 .
## mws 1.0750 0.4099 2.623 0.00914 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.937 3.997 93.79 < 2e-16 ***
## s(sum_rain) 3.813 3.965 11.28 8.12e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.566 Deviance explained = 57.8%
## GCV = 0.64959 Scale est. = 0.63062 n = 334
This is the same as mod8 but using family = tw(), see ?family.mgcv for more on the families. The Tweedie distribution is used where the distribution has a positive mass at zero, but is continuous unlike the Poisson distribution that requires count data. The data visualisation shows clearly that the mean pot count data have this shape.
##
## Family: Tweedie(p=1.044)
## Link function: log
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5) + s(sum_rain,
## k = 5)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.22821 0.04098 -5.569 5.39e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.496 3.855 123.776 <2e-16 ***
## s(mws) 1.980 2.079 0.772 0.487
## s(sum_rain) 2.830 2.894 5.509 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.674 Deviance explained = 61.2%
## -REML = 310.11 Scale est. = 0.36397 n = 334
Try fitting the same model using a different shrinkage smoother, ts is thin- plate splines.
##
## Family: Tweedie(p=1.044)
## Link function: log
##
## Formula:
## mean_count_pot ~ s(dist, k = 5, bs = "ts") + s(mws, k = 5, bs = "ts") +
## s(sum_rain, k = 5, bs = "ts")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.22198 0.04089 -5.429 1.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.2481 4 117.660 < 2e-16 ***
## s(mws) 0.8917 4 2.005 0.000565 ***
## s(sum_rain) 2.8824 4 15.841 2.24e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.657 Deviance explained = 60%
## -REML = 319.66 Scale est. = 0.36505 n = 334
This model uses
Looking at mod11.1, mws is not significant as a smoothed term, and we’d suspect it was linear anyway, so add it as a linear term only.
##
## Family: Tweedie(p=1.044)
## Link function: log
##
## Formula:
## mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5) + mws
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.45815 0.09542 -4.801 2.41e-06 ***
## mws 0.06588 0.02373 2.776 0.00582 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(dist) 3.496 3.854 123.47 < 2e-16 ***
## s(sum_rain) 2.907 2.996 19.44 1.04e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.661 Deviance explained = 60.2%
## -REML = 312.88 Scale est. = 0.36493 n = 334
## # A tibble: 13 x 7
## model df logLik AIC BIC deviance df.residual
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 mod11.0 9.31 -320. 663. 709. 141. 325.
## 2 mod12 8.40 -332. 686. 728. 144. 326.
## 3 mod11.1 8.02 -334. 689. 730. 145. 326.
## 4 mod8 9.80 -392. 805. 847. 204. 324.
## 5 mod10 9.75 -392. 805. 846. 204. 324.
## 6 mod6 9.87 -392. 806. 847. 204. 324.
## 7 mod7 8.85 -394. 808. 845. 207. 325.
## 8 mod9 8.83 -396. 812. 849. 210. 325.
## 9 mod4 6.93 -412. 840. 871. 231. 327.
## 10 mod3 5.93 -416. 846. 873. 236. 328.
## 11 mod2 5.93 -419. 851. 877. 240. 328.
## 12 mod5 5.93 -419. 851. 877. 240. 328.
## 13 mod1 4.93 -424. 860. 883. 248. 329.
## # A tibble: 13 x 2
## name value
## <chr> <dbl>
## 1 mod11.0 0.674
## 2 mod12 0.661
## 3 mod11.1 0.657
## 4 mod8 0.566
## 5 mod6 0.566
## 6 mod10 0.566
## 7 mod7 0.562
## 8 mod9 0.557
## 9 mod4 0.515
## 10 mod3 0.504
## 11 mod2 0.498
## 12 mod5 0.498
## 13 mod1 0.482
##
## mod1
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 1.080240 -13.444330 7.368783 -1.254080 -2.291845
##
## mod2
## (Intercept) sum_rain s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 0.81692128 0.02877889 -13.40978000 7.33464779 -1.25209090 -2.29384111
##
## mod3
## (Intercept) mws s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 0.6440143 0.1227341 -13.4868763 7.3983152 -1.2560432 -2.2949104
##
## mod4
## (Intercept) sum_rain mws s(dist).1 s(dist).2 s(dist).3
## 0.47344866 0.02400933 0.10891607 -13.45271077 7.36612892 -1.25413637
## s(dist).4
## -2.29619141
##
## mod5
## (Intercept) sum_rain s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 0.81692128 0.02877889 -13.40978000 7.33464779 -1.25209090 -2.29384111
##
## mod6
## (Intercept) sum_rain s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 1.37884978 -0.03263607 -13.52238405 7.40993482 -1.25717173 -2.30322423
## s(mws).1 s(mws).2 s(mws).3 s(mws).4
## 0.99626476 -10.70707340 6.66914918 0.59636619
##
## mod7
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4 s(mws).1
## 1.0802395 -13.4771686 7.3732417 -1.2548767 -2.3023128 0.8682278
## s(mws).2 s(mws).3 s(mws).4
## -6.5428897 3.8305035 0.6604538
##
## mod8
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 1.0802395 -13.5231892 7.4108603 -1.2572233 -2.3031202
## s(mws).1 s(mws).2 s(mws).3 s(mws).4 s(sum_rain).1
## 0.3909139 -0.7059823 1.4344954 0.8082756 19.7199870
## s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## 328.5020104 -102.4610960 1.3525266
##
## mod9
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4
## 1.080240 -13.505613 7.398727 -1.256415 -2.301825
## s(sum_rain).1 s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## -7.708430 -109.360856 34.329152 -1.330657
##
## mod10
## (Intercept) mws s(dist).1 s(dist).2 s(dist).3
## -2.740619 1.075017 -13.510016 7.401753 -1.256617
## s(dist).4 s(sum_rain).1 s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## -2.302155 82.075300 1298.724127 -407.240141 8.651373
##
## mod11.0
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4
## -0.2282130 -3.7637074 1.8984225 -0.5265088 -1.1208624
## s(mws).1 s(mws).2 s(mws).3 s(mws).4 s(sum_rain).1
## 0.5077483 -1.0676679 1.5218322 0.4780137 -0.9317942
## s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## 5.0627315 -1.2859969 -0.9520154
##
## mod11.1
## (Intercept) s(dist).1 s(dist).2 s(dist).3 s(dist).4
## -2.219778e-01 -2.921037e+00 1.309050e+00 -4.875628e-01 -1.064831e+00
## s(mws).1 s(mws).2 s(mws).3 s(mws).4 s(sum_rain).1
## 6.549552e-06 -6.671605e-04 1.691059e-03 8.920753e-02 -6.612916e-01
## s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## 1.173167e+00 -4.027855e-01 -5.312479e-01
##
## mod12
## (Intercept) mws s(dist).1 s(dist).2 s(dist).3
## -0.45814928 0.06587563 -3.76782363 1.90297824 -0.52681818
## s(dist).4 s(sum_rain).1 s(sum_rain).2 s(sum_rain).3 s(sum_rain).4
## -1.12066486 -0.46302990 4.76460329 -1.55284478 -0.53564456
## Analysis of Deviance Table
##
## Model 1: mean_count_pot ~ s(dist, k = 5)
## Model 2: mean_count_pot ~ sum_rain + s(dist, k = 5)
## Model 3: mean_count_pot ~ mws + s(dist, k = 5)
## Model 4: mean_count_pot ~ sum_rain + mws + s(dist, k = 5)
## Model 5: mean_count_pot ~ sum_rain + s(dist + mws, k = 5)
## Model 6: mean_count_pot ~ sum_rain + s(dist, k = 5) + s(mws, k = 5)
## Model 7: mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5)
## Model 8: mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5) + s(sum_rain,
## k = 5)
## Model 9: mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5)
## Model 10: mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5) + mws
## Model 11: mean_count_pot ~ s(dist, k = 5) + s(mws, k = 5) + s(sum_rain,
## k = 5)
## Model 12: mean_count_pot ~ s(dist, k = 5, bs = "ts") + s(mws, k = 5, bs = "ts") +
## s(sum_rain, k = 5, bs = "ts")
## Model 13: mean_count_pot ~ s(dist, k = 5) + s(sum_rain, k = 5) + mws
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 329.00 248.10
## 2 328.00 239.72 1.00019797 8.38 23.0239 2.443e-06 ***
## 3 328.00 236.49 0.00017108 3.22 51787.2571 2.104e-07 ***
## 4 327.00 230.81 1.00011438 5.69 15.6198 9.506e-05 ***
## 5 328.00 239.72 -1.00028547 -8.91 24.4745 1.209e-06 ***
## 6 324.01 204.37 3.99671315 35.35 24.3000 < 2.2e-16 ***
## 7 325.01 206.98 -1.00100428 -2.61 7.1723 0.0077645 **
## 8 324.02 204.41 0.98528685 2.57 7.1674 0.0080586 **
## 9 325.01 209.55 -0.99146246 -5.14 14.2473 0.0001994 ***
## 10 324.04 204.48 0.97643885 5.07 14.2777 0.0002125 ***
## 11 323.65 639.81 0.38652451 -435.32
## 12 324.67 667.59 -1.02220468 -27.78 74.6727 < 2.2e-16 ***
## 13 324.70 664.47 -0.02920554 3.12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Method: REML Optimizer: outer newton
## full convergence after 9 iterations.
## Gradient range [-0.0002827365,-8.478081e-09]
## (score 310.1067 & scale 0.363966).
## Hessian positive definite, eigenvalue range [0.367564,2978.833].
## Model rank = 13 / 13
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(dist) 4.00 3.50 0.86 0.010 **
## s(mws) 4.00 1.98 0.89 0.050 *
## s(sum_rain) 4.00 2.83 0.91 0.065 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This model, mod11.0, mean_pot_count ~ s(Distance) + s(Windspeed) + s(Precipitation) - family = tw(), lacks enough numbers of basis functions for two of three predictors, dist, wsp, but it’s close. Realistically we need to increase the k values for these predictors, but we’re lacking the data to do this.